Explainable AI-Driven Heart Disease Prediction Using Ensemble Learning and SHAP-Based Clinical Interpretation

Authors: Sourav Angre, Ritesh Patil, Prasad Yeole, Mrs. Varsha Dharmadhikari

DOI Link: https://doi.org/10.22214/ijraset.2026.83848

Abstract

Heart disease remains one of the leading causes of mortality worldwide, emphasizing the need for accurate and interpretable predictive systems that can support early diagnosis and clinical decision-making. While machine learning techniques have demonstrated strong predictive capabilities in cardiovascular disease detection, their adoption in healthcare is often limited by the lack of transparency and explainability in model predictions. This study proposes an Explainable Artificial Intelligence (XAI)-driven framework for heart disease prediction using clin-ical parameters and ensemble learning techniques. The frame-work utilizes the UCI Heart Disease Dataset, comprising 1,025 patient records with 13 clinically relevant attributes, including age, chest pain type, cholesterol level, resting blood pressure, maximum heart rate achieved, and exercise-induced angina. Two machine learning models, namely Logistic Regression and Ran-dom Forest, are developed and evaluated for binary classification of heart disease risk. To enhance model transparency and clinical trust, SHapley Additive Explanations (SHAP) are integrated to provide both global and patient-level interpretations of model predictions. Global explanations identify the most influential clinical factors affecting prediction outcomes, while local explanations provide individualized reasoning for specific patient predictions. Ex-perimental results demonstrate that the Random Forest model outperforms Logistic Regression, achieving superior accuracy, precision, recall, and F1-score. Receiver Operating Character-istic (ROC) analysis further confirms the strong discriminative capability of the proposed framework, achieving an Area Under the Curve (AUC) of 0.857. SHAP-based analysis reveals that clinical attributes such as the number of major vessels (ca), chest pain type (cp), thalassemia status (thal), ST depression induced by exercise (oldpeak), and maximum heart rate achieved (thalach) are among the most influ-ential predictors of heart disease. The proposed framework not only delivers reliable predictive performance but also provides interpretable and clinically meaningful explanations, making it a practical decision-support tool for healthcare professionals.

Introduction

Cardiovascular diseases (CVDs) are among the leading causes of global mortality, creating major health and economic challenges. Early detection of heart disease risk is essential for timely treatment and improved patient outcomes. Traditional diagnosis methods depend on clinical expertise, laboratory tests, and medical imaging, but the increasing availability of healthcare data has encouraged the use of Machine Learning (ML) for automated risk prediction.

Machine learning algorithms such as Logistic Regression, Support Vector Machines, Decision Trees, and Random Forest can identify hidden patterns in clinical data and assist healthcare professionals in decision-making. However, many high-performing ML models lack transparency and operate as black-box systems, reducing clinician trust.

To overcome this limitation, Explainable Artificial Intelligence (XAI) techniques, especially SHapley Additive Explanations (SHAP), are used to provide explanations for model predictions. SHAP identifies the contribution of individual clinical features to prediction outcomes, enabling both overall model interpretation and patient-specific explanations.

The proposed research develops an Explainable AI-based heart disease prediction framework using clinical data from the UCI Heart Disease Dataset. It combines machine learning classification with SHAP-based explainability to achieve accurate, transparent, and clinically meaningful predictions.

Research Contributions

The study focuses on:

Developing an explainable heart disease prediction system using clinical parameters.
Comparing Logistic Regression and Random Forest models for classification.
Integrating SHAP to explain feature importance and individual predictions.
Identifying important cardiovascular risk factors.
Improving transparency and trust in AI-based healthcare decision systems.

Literature Review Summary

Traditional Prediction Methods

Early heart disease prediction relied on statistical methods such as Logistic Regression and rule-based systems. These approaches were simple and interpretable but had limitations in handling complex nonlinear relationships between multiple clinical factors.

Common clinical indicators include:

Age
Blood pressure
Cholesterol level
Chest pain type
ECG results
Heart rate

Machine Learning-Based Prediction

Machine learning has improved disease prediction by analyzing large healthcare datasets and identifying complex patterns. Common algorithms include:

Logistic Regression
Support Vector Machines (SVM)
K-Nearest Neighbors (KNN)
Decision Trees
Naïve Bayes
Artificial Neural Networks

Although these models often provide higher accuracy than traditional methods, many lack interpretability, which is critical in medical applications.

Ensemble Learning and Random Forest

Random Forest is an ensemble learning technique that combines multiple decision trees to improve accuracy and reduce overfitting. It is effective for healthcare applications because it:

Handles complex feature relationships.
Works well with diverse clinical data.
Provides robust predictions despite noise or missing values.

Studies show Random Forest often performs better than traditional classifiers in heart disease prediction tasks.

Explainable Artificial Intelligence (XAI)

XAI improves trust in machine learning systems by explaining why a model produces a specific prediction. SHAP is a widely used XAI method based on cooperative game theory.

SHAP provides:

Global explanations: Overall importance of clinical features.
Local explanations: Reasons behind an individual patient's prediction.

This improves model transparency and helps clinicians understand AI-assisted decisions.

Research Gap

Existing heart disease prediction studies mainly focus on improving accuracy while giving limited attention to explainability. Many systems provide only general feature importance rather than patient-level explanations.

The research addresses this gap by combining:

Random Forest prediction capability.
SHAP-based interpretability.
Clinically meaningful explanations.

Proposed Methodology

Dataset

The study uses the UCI Heart Disease Dataset, containing:

1,025 patient records
13 clinical features
1 target variable indicating presence or absence of heart disease.

Clinical features include:

Age
Sex
Chest pain type
Resting blood pressure
Cholesterol
Blood sugar
ECG results
Maximum heart rate
Exercise-induced angina
ST depression
ST slope
Major vessels
Thalassemia

The problem is treated as a binary classification task.

Data Preprocessing

The preprocessing steps include:

Checking data quality and consistency.
Removing errors or inconsistencies.
Separating input features and target labels.
Splitting data into:
- 80% training data
- 20% testing data
Preparing the dataset for model training and evaluation.

Model Development

1. Logistic Regression

Logistic Regression is used as a baseline classification model. It predicts the probability of heart disease using clinical features.

Advantages:

Simple and interpretable.
Computationally efficient.
Suitable for binary classification.

Limitation:

Cannot effectively capture complex nonlinear relationships.

2. Random Forest

Random Forest is an ensemble model that combines multiple decision trees to improve prediction accuracy.

Advantages:

Reduces overfitting.
Captures nonlinear relationships.
Handles complex healthcare data effectively.
Provides strong classification performance.

Conclusion

This study presented an Explainable AI-driven framework for heart disease prediction using clinical parameters and ensemble learning techniques. The proposed approach utilized the UCI Heart Disease Dataset comprising 1,025 patient records and thirteen clinically relevant attributes to develop a predictive system capable of identifying the presence of heart disease. Two machine learning algorithms, Logistic Regression and Random Forest, were investigated and evaluated using standard classification metrics. Experimental results demonstrated that the Random Forest classifier outperformed Logistic Regression across all eval-uation measures, including accuracy, precision, recall, and F1-score. The model achieved strong predictive performance while maintaining robust generalization capability. Further-more, ROC analysis confirmed the effectiveness of the frame-work, achieving an Area Under the Curve (AUC) value of 0.857, indicating reliable discrimination between patients with and without heart disease. A key contribution of this research is the integration of SHapley Additive Explanations (SHAP) to enhance model transparency and interpretability. Through both global and patient-level explanations, the framework identified clinically significant predictors such as the number of major vessels (ca), chest pain type (cp), thalassemia status (thal), ST de-pression induced by exercise (oldpeak), and maximum heart rate achieved (thalach). These explanations provide valuable insights into the factors influencing prediction outcomes and improve trust in machine learning-assisted clinical decision-making. Unlike conventional black-box prediction systems, the pro-posed framework combines predictive accuracy with explain-ability, enabling healthcare professionals to understand the reasoning behind model decisions. This capability enhances transparency and supports the practical adoption of artifi-cial intelligence in clinical environments. By providing in-terpretable predictions and clinically meaningful explanations, the framework serves as a reliable decision-support tool for heart disease risk assessment. Future work may focus on evaluating the framework using larger and more diverse clinical datasets, incorporating addi-tional machine learning and deep learning models, and imple-menting advanced validation strategies such as cross-validation and external dataset testing. Further improvements may also include real-time deployment within healthcare applications and the integration of additional explainability techniques to strengthen clinical usability and trust.

References

[1] Dua and C. Graff, “UCI Machine Learning Repository,” University of California, Irvine, School of Information and Computer Sciences, 2019. [2] R. Detrano et al., “International application of a new probability al-gorithm for the diagnosis of coronary artery disease,” The American Journal of Cardiology, vol. 64, no. 5, pp. 304–310, 1989. [3] M. Janosi, W. Steinbrunn, M. Pfisterer, and R. Detrano, “Heart Disease Dataset,” UCI Machine Learning Repository, 1988. [4] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2nd ed. New York, NY, USA: Springer, 2009. [5] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. [6] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995. [7] P. K. Anooj, “Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules,” Journal of King Saud University – Computer and Information Sciences, vol. 24, no. 1, pp. 27–40, 2012. [8] M. Akhil Jabbar, B. L. Deekshatulu, and P. Chandra, “Heart disease prediction system using associative classification and genetic algorithm,” Procedia Technology, vol. 10, pp. 183–192, 2013. [9] K. Polat and S. Gu¨nes¸, “A hybrid approach to medical decision support system based on principal component analysis and adaptive neuro-fuzzy inference system,” Applied Mathematics and Computation, vol. 189, no. 2, pp. 1533–1544, 2007. [10] M. Chicco and G. Jurman, “Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone,” BMC Medical Informatics and Decision Making, vol. 20, no. 1, 2020. [11] S. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 4765–4774. [12] S. M. Lundberg et al., “From local explanations to global understanding with explainable AI for trees,” Nature Machine Intelligence, vol. 2, no. 1, pp. 56–67, 2020. [13] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you? Explaining the predictions of any classifier,” in Proc. ACM SIGKDD, 2016, pp. 1135–1144. [14] A. Adadi and M. Berrada, “Peeking inside the black-box: A survey on explainable artificial intelligence (XAI),” IEEE Access, vol. 6, pp. 52138–52160, 2018. [15] D. Gunning and D. Aha, “DARPA’s Explainable Artificial Intelligence (XAI) Program,” AI Magazine, vol. 40, no. 2, pp. 44–58, 2019. [16] A. Holzinger et al., “What do we need to build explainable AI systems for the medical domain?” arXiv preprint arXiv:1712.09923, 2017. [17] A. Rajkomar, J. Dean, and I. Kohane, “Machine learning in medicine,” New England Journal of Medicine, vol. 380, no. 14, pp. 1347–1358, 2019. [18] E. J. Topol, “High-performance medicine: The convergence of human and artificial intelligence,” Nature Medicine, vol. 25, pp. 44–56, 2019. [19] R. Miotto, F. Wang, S. Wang, X. Jiang, and J. T. Dudley, “Deep learning for healthcare: Review, opportunities and challenges,” Briefings in Bioinformatics, vol. 19, no. 6, pp. 1236–1246, 2018. [20] A. Esteva et al., “A guide to deep learning in healthcare,” Nature Medicine, vol. 25, no. 1, pp. 24–29, 2019. [21] J. Pearl, Causality: Models, Reasoning and Inference, 2nd ed. Cam-bridge, UK: Cambridge University Press, 2009. [22] F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” arXiv preprint arXiv:1702.08608, 2017.

Copyright

Copyright © 2026 Sourav Angre, Ritesh Patil, Prasad Yeole, Mrs. Varsha Dharmadhikari. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET83848

Publish Date : 2026-06-20

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here